Skip to content

feat(safeoutputs): add daily smoke suite and reject unknown safe-output keys#563

Merged
jamesadevine merged 3 commits into
mainfrom
feat/safe-output-smoke-suite
May 16, 2026
Merged

feat(safeoutputs): add daily smoke suite and reject unknown safe-output keys#563
jamesadevine merged 3 commits into
mainfrom
feat/safe-output-smoke-suite

Conversation

@jamesadevine
Copy link
Copy Markdown
Collaborator

Summary

Adds an end-to-end daily smoke suite for every production safe output and tightens the compiler so silently-dropped safe-outputs: keys can no longer hide.

26 daily smoke fixtures under tests/safe-outputs/

Each fixture is a minimal agentic-pipeline .md that calls exactly one safe-output tool with predictable literal values, plus a committed .lock.yml. Schedule baked into every fixture; after the operator registers them once in the AgentPlayground ADO sandbox the suite runs unattended.

  • 22 production safe outputs (daily around 03:00) — noop, missing-data, missing-tool, report-incomplete, create-work-item, comment-on-work-item, update-work-item, link-work-items, create-branch, create-git-tag, create-wiki-page, update-wiki-page, add-build-tag, queue-build, create-pull-request, add-pr-comment, reply-to-pr-comment, resolve-pr-thread, submit-pr-review, update-pr, upload-build-attachment, upload-workitem-attachment, upload-pipeline-artifact.
  • noop-target.md — trivial target pipeline that the queue-build smoke fires at.
  • janitor.md (weekly on monday around 02:00) — prunes ado-aw-smoke-* artifacts older than 30 days.
  • smoke-failure-reporter.md (daily around 04:30) — queries the ADO REST builds endpoint, finds failed smoke runs, and uses ado-aw-debug.create-issue (feat(safeoutputs): add ado-aw-debug.create-issue for dogfood pipelines #492) to file [smoke-failure] ... issues against githubnext/ado-aw. allowed-labels: [], max: 5, title-prefix dedupes against open issues. The ADO_AW_DEBUG_GITHUB_TOKEN PAT must live only on this pipeline.

README.md documents the suite layout; REGISTERED.md is a contributor-maintained fixture → ADO pipeline ID table plus the manual-handoff checklist (service connections, perma-PR, variable group, token provisioning).

Compiler: fail loud on unknown safe-output keys

generate_enabled_tools_args used to skip unknown keys with only a console warning. Authoring this PR surfaced two real-world consequences:

  1. The fixture filename rename create_prcreate_pull_request (see below) initially used create-pr: in fixtures and compiled cleanly with the tool dropped at runtime.
  2. The long-standing reply-to-pr-comment MCP-name discrepancy (see below) is exactly this class of silent failure.

The new validate_safe_outputs_keys in src/compile/common.rs runs in the validation block before pipeline emission. Unknown keys now bail with a structured error listing all known tools that share the typo's first hyphen-segment, e.g.:

safe-outputs contains unrecognised tool name(s):
  - create-pr (similar known tools: create-branch, create-git-tag, create-pull-request, create-wiki-page, create-work-item)

Valid safe-output keys are listed in docs/safe-outputs.md. Each key must
match exactly the kebab-case name registered by a tool in src/safeoutputs/
(e.g. `create-pull-request`, not `create-pr`).

The memory migration key remains a soft warning (back-compat); DEBUG_ONLY_TOOLS keys keep their more specific error from validate_ado_aw_debug_config. The plain Levenshtein "did you mean X?" heuristic was rejected — it returned misleading suggestions like update-pr for create-pr because both end in -pr. Listing every same-prefix candidate is honest and lets the operator pick.

File ↔ YAML-key alignment

  • src/safeoutputs/create_pr.rssrc/safeoutputs/create_pull_request.rs (now matches YAML key create-pull-request). git mv preserves history.
  • Fix real bug: reply_to_pr_comment.rs declared MCP tool name reply-to-pr-review-comment, while docs/safe-outputs.md, README.md, prompts/*, and site/ all said reply-to-pr-comment. Updated Rust to match docs across tool_result!, mcp.rs registration, execute.rs dispatch, four unit tests, and internal log strings. Technically a behavioural rename of the wire-level MCP name, but no user could have invoked the previously-registered name successfully (it wasn't documented anywhere), so this is a fix rather than a breaking change.

Test plan

  • cargo build — clean (warnings unchanged from baseline).
  • cargo test — every existing test passes plus 9 new unit tests for validate_safe_outputs_keys and related_safe_output_names:
    • known-keys accept,
    • empty safe-outputs: accept,
    • typo reject + create-pull-request appears in the similar-tools hint for create-pr,
    • distant typo reject without misleading hint,
    • create-issue not double-flagged (deferred to validate_ado_aw_debug_config),
    • memory migration soft-path,
    • invalid characters reject with a structural error,
    • related_safe_output_names("create-pr") includes every create-* and excludes update-pr,
    • related_safe_output_names("fabricated-tool-name") returns empty.
  • cargo clippy --all-targets --all-features -- -D warnings — 100+ baseline errors, 0 net-new from changed files (src/compile/common.rs, src/safeoutputs/create_pull_request.rs, src/safeoutputs/reply_to_pr_comment.rs, src/safeoutputs/mod.rs, src/mcp.rs, src/execute.rs).
  • tests/bash_lint_tests.rs with ENFORCE_BASH_LINT=1 and shellcheck on PATH — passes (existing fixtures only; the new smoke fixtures are validated separately in the sweep below).
  • Fixture sweep — all 26 fixtures pass ado-aw compile and ado-aw check; the resulting bash bodies (mostly set -euo pipefail + printf setup steps) pass shellcheck.

Follow-ups (out of scope, manual)

  1. Register one ADO pipeline per *.lock.yml in https://dev.azure.com/msazuresphere/AgentPlayground, fill in tests/safe-outputs/REGISTERED.md in a follow-up docs PR.
  2. Provision ADO_AW_DEBUG_GITHUB_TOKEN (secret, scoped Issues: Read and write on githubnext/ado-aw only) on the smoke-failure-reporter pipeline.
  3. Confirm engine.model: gpt-5-mini is available to copilot in the tenant before enabling the smokes; if not, update the model field across the fixture set.
  4. Flesh out setup: placeholders in janitor.md and resolve-pr-thread.md with real az / REST commands.

…ut keys

* Add tests/safe-outputs/ — 26 daily-scheduled agentic-pipeline fixtures covering all 22 production safe outputs plus a janitor, a queue-build target, and a smoke-failure-reporter that uses ado-aw-debug.create-issue (PR #492).

* Add validate_safe_outputs_keys to fail compile on unrecognised safe-outputs keys (previously silently dropped with only a console warning). Includes a related_safe_output_names helper that lists known tools sharing the typo's first hyphen-segment as a hint.

* Rename create_pr.rs -> create_pull_request.rs so the file name matches the YAML key. Fix the long-standing reply_to_pr_comment.rs name mismatch: the MCP tool was registered as reply-to-pr-review-comment while every user-facing doc, README, prompt, and site page said reply-to-pr-comment. Tool name now matches the docs across mcp.rs tool registration, execute.rs dispatch, and unit tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good — correctness solid, tests comprehensive. A few minor points worth addressing.

Findings

⚠️ Suggestions

  • src/compile/common.rs:1636-1642 — The warning-and-skip path for unknown safe-output tool names in generate_enabled_tools_args is now dead code on the normal compile path. validate_safe_outputs_keys runs at line 3000 (validation phase) and bails hard before generate_enabled_tools_args can ever see an unknown key. The comment at line 1657 ("Every user-specified key was either invalid or unrecognized") is no longer accurate for the unknown case. Since generate_enabled_tools_args is pub, the defensive check isn't harmful, but it creates a misleading impression that warnings are the error surface here. Consider adding a comment like // unreachable in practice: validate_safe_outputs_keys bails earlier or removing the unrecognized-key branch now that the validator owns that responsibility.

  • src/compile/common.rs:1904-1910validate_safe_outputs_keys bails on the first invalid-character key but collects all unknown-but-syntactically-valid keys before reporting. If a user has two bad-character keys they'll only see the first. Minor UX inconsistency — consider collecting character-invalid keys into the same unknown vec (with a different message) so the report is complete in one pass.

  • src/compile/common.rs:1964key.split('-').next().unwrap_or(key) — the .unwrap_or(key) fallback is unreachable: str::split().next() always returns Some(...) (even for "", it yields Some("")). The empty-string case is already handled by the if head.is_empty() guard below. The unwrap_or is harmless but a reader might wonder why it's there. unwrap_or_default() or an inline comment would be cleaner.

✅ What Looks Good

  • The validate_safe_outputs_keys / validate_ado_aw_debug_config split is well-designed: debug-only tools get a purpose-specific error, unknowns get the typo-hint error, memory gets the soft migration warning. The ordering (line 3000 then 3006) makes the sequencing explicit.
  • The reply-to-pr-comment rename is thorough — all five sites updated (mcp.rs #[tool(name=...)], execute.rs dispatch, tool_result! macro, log strings, four unit test assertions).
  • 9 unit tests cover all meaningful branches of validate_safe_outputs_keys and related_safe_output_names, including the dedup-for-debug-only and soft-migration cases.
  • Rejecting create-pr-style typos with a create-* candidate list (rather than a Levenshtein nearest-neighbor) is the right call — the PR description's reasoning about update-pr false-matches is solid.

Generated by Rust PR Reviewer for issue #563 · ● 1.1M ·

@jamesadevine
Copy link
Copy Markdown
Collaborator Author

@copilot implement suggestions

…s_keys

Agent-Logs-Url: https://github.com/githubnext/ado-aw/sessions/3358604b-8ec8-4b6e-918b-81f8b68ef58f

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 16, 2026

@copilot implement suggestions

Done in 90467d7. All three suggestions from the automated review are now addressed:

  1. Dead branch in generate_enabled_tools_args — replaced the eprintln warning with a // Unreachable in practice: validate_safe_outputs_keys bails before… comment; updated the stale effective_mcp_tool_count == 0 comment too.
  2. Collect all invalid-character keysvalidate_safe_outputs_keys now collects every key that fails is_safe_tool_name into invalid_names and reports them all in one error, rather than bailing on the first.
  3. unwrap_or(key)unwrap_or_default()str::split().next() never returns None, so the fallback was unreachable; unwrap_or_default() (yields "") is cleaner and the existing head.is_empty() guard handles that case.

A new unit test test_validate_safe_outputs_keys_reports_all_invalid_characters covers the multi-key path.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/githubnext/ado-aw/issues
    • Triggering command: /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/ado_aw-3136f96127ec4700 /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/ado_aw-3136f96127ec4700 /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.1nkssbntk2etip7qbmul5vt08.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.1pwtpr5rv2958uanfl895250m.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.1tqokxs2atpyyh0j3a5jx8yms.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.208vfdai33n0gpqg4l1kdlzo9.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.2jdi5ky8gjgd29ino23xlf9sv.1tz163s.rcgu.o 45.00hakwswftj6wrtz9lkb4tl5a.0zek4i9.rcgu.o 45.015fq4h8sttxoyzgnd1z46hc4.0zek4i9.rcgu.o 45.08anb3in7gs8u7so2z1uwo3e5.0zek4i9.rcgu.o 45.08hb6jqggco9c9o7ouh6hh26v.0zek4i9.rcgu.o 45.0gqfg4m63fpgcsla0fcelaguh.0zek4i9.rcgu.o 45.0yb5s6193a0wggzgv0v05kyl6.0zek4i9.rcgu.o 45.1ajfx1lje5gnhd2s2oittvglo.0zek4i9.rcgu.o 45.20awdpymdryjxm2uuz5t4bffl.0zek4i9.rcgu.o 45.24jvrpchkioadaybs0mkzmpxj.0zek4i9.rcgu.o 45.29d554253a7ua8jdy2djiuujt.0zek4i9.rcgu.o 45.2bikq83kqxzd59u4d0aoev3vs.0zek4i9.rcgu.o 45.2boqvweqww93ilwsx0c22c3r0.0zek4i9.rcgu.o 45.2d2vuk299nc534pripnny1rfr.0zek4i9.rcgu.o 45.2hkf7twmmm5z0ktz2vjsee9ue.0zek4i9.rcgu.o (http block)
  • spsprodeus21.vssps.visualstudio.com
    • Triggering command: /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/ado_aw-3136f96127ec4700 /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/ado_aw-3136f96127ec4700 /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.1nkssbntk2etip7qbmul5vt08.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.1pwtpr5rv2958uanfl895250m.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.1tqokxs2atpyyh0j3a5jx8yms.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.208vfdai33n0gpqg4l1kdlzo9.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.2jdi5ky8gjgd29ino23xlf9sv.1tz163s.rcgu.o 45.00hakwswftj6wrtz9lkb4tl5a.0zek4i9.rcgu.o 45.015fq4h8sttxoyzgnd1z46hc4.0zek4i9.rcgu.o 45.08anb3in7gs8u7so2z1uwo3e5.0zek4i9.rcgu.o 45.08hb6jqggco9c9o7ouh6hh26v.0zek4i9.rcgu.o 45.0gqfg4m63fpgcsla0fcelaguh.0zek4i9.rcgu.o 45.0yb5s6193a0wggzgv0v05kyl6.0zek4i9.rcgu.o 45.1ajfx1lje5gnhd2s2oittvglo.0zek4i9.rcgu.o 45.20awdpymdryjxm2uuz5t4bffl.0zek4i9.rcgu.o 45.24jvrpchkioadaybs0mkzmpxj.0zek4i9.rcgu.o 45.29d554253a7ua8jdy2djiuujt.0zek4i9.rcgu.o 45.2bikq83kqxzd59u4d0aoev3vs.0zek4i9.rcgu.o 45.2boqvweqww93ilwsx0c22c3r0.0zek4i9.rcgu.o 45.2d2vuk299nc534pripnny1rfr.0zek4i9.rcgu.o 45.2hkf7twmmm5z0ktz2vjsee9ue.0zek4i9.rcgu.o (dns block)
  • spsprodweu4.vssps.visualstudio.com
    • Triggering command: /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/ado_aw-3136f96127ec4700 /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/ado_aw-3136f96127ec4700 /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.1nkssbntk2etip7qbmul5vt08.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.1pwtpr5rv2958uanfl895250m.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.1tqokxs2atpyyh0j3a5jx8yms.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.208vfdai33n0gpqg4l1kdlzo9.1tz163s.rcgu.o /home/REDACTED/work/ado-aw/ado-aw/target/debug/deps/codemod_tests-63d282cbd9099a7d.2jdi5ky8gjgd29ino23xlf9sv.1tz163s.rcgu.o 45.00hakwswftj6wrtz9lkb4tl5a.0zek4i9.rcgu.o 45.015fq4h8sttxoyzgnd1z46hc4.0zek4i9.rcgu.o 45.08anb3in7gs8u7so2z1uwo3e5.0zek4i9.rcgu.o 45.08hb6jqggco9c9o7ouh6hh26v.0zek4i9.rcgu.o 45.0gqfg4m63fpgcsla0fcelaguh.0zek4i9.rcgu.o 45.0yb5s6193a0wggzgv0v05kyl6.0zek4i9.rcgu.o 45.1ajfx1lje5gnhd2s2oittvglo.0zek4i9.rcgu.o 45.20awdpymdryjxm2uuz5t4bffl.0zek4i9.rcgu.o 45.24jvrpchkioadaybs0mkzmpxj.0zek4i9.rcgu.o 45.29d554253a7ua8jdy2djiuujt.0zek4i9.rcgu.o 45.2bikq83kqxzd59u4d0aoev3vs.0zek4i9.rcgu.o 45.2boqvweqww93ilwsx0c22c3r0.0zek4i9.rcgu.o 45.2d2vuk299nc534pripnny1rfr.0zek4i9.rcgu.o 45.2hkf7twmmm5z0ktz2vjsee9ue.0zek4i9.rcgu.o (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@jamesadevine
Copy link
Copy Markdown
Collaborator Author

/rust-review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Rust PR Reviewer completed successfully!

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good — clean implementation with solid test coverage; one minor log-string inconsistency noted.

Findings

⚠️ Suggestions

  • src/mcp.rs:714 — Log message still says "Tool called: create_pr" while the parallel rename for reply-to-pr-comment correctly updated its log string to the wire name ("Tool called: reply-to-pr-comment"). Inconsistency only; no behavioral impact.

    // current
    info!("Tool called: create_pr - '{}'", params.0.title);
    // consistent with the rename pattern:
    info!("Tool called: create-pull-request - '{}'", params.0.title);

✅ What Looks Good

  • validate_safe_outputs_keys design is sound. The special-case ordering (invalid_namesNON_MCP_SAFE_OUTPUT_KEYSmemoryDEBUG_ONLY_TOOLSALL_KNOWN_SAFE_OUTPUTS) matches the comments exactly and avoids double-reporting the create-issue debug-only tool. Early bail on invalid characters is the right call.

  • related_safe_output_names over Levenshtein — the PR description explains the reasoning well, and the first-hyphen-segment grouping is provably correct: it lists every create-* candidate without the false hit of update-pr for create-pr. The test that verifies this exclusion (!related.contains(&"update-pr")) is a nice explicit guard.

  • reply-to-pr-review-commentreply-to-pr-comment rename is complete — updated across all four call sites: tool_result! macro declaration, mcp.rs #[tool(name = ...)], execute.rs dispatch match arm, get_tool_config lookup key, and all four unit tests. No leftover references found in src/, docs/, prompts/, or tests/.

  • create_pr.rscreate_pull_request.rs module rename is surgical: the file name moves to match the wire key, struct names (CreatePrResult, CreatePrParams, CreatePrConfig) are preserved (avoiding churn), and CreatePrResult::NAME == "create-pull-request" was already correct before this PR via the tool_result! macro. ALL_KNOWN_SAFE_OUTPUTS derives names from the type's NAME constant so the constant list required zero updates.

  • Validation is wired into all compile targetsvalidate_safe_outputs_keys is called inside compile_shared, which is reachable from standalone, 1ES, job, and stage paths.

  • 9 unit tests cover the full decision tree including the memory migration soft-path, the create-issue debug-only bypass, the all-invalid-keys-in-one-error accumulation, and the related_safe_output_names prefix logic.

Generated by Rust PR Reviewer for issue #563 · ● 1.5M ·

@jamesadevine
Copy link
Copy Markdown
Collaborator Author

@copilot implement suggestion

Agent-Logs-Url: https://github.com/githubnext/ado-aw/sessions/b3784ea7-b6cf-4671-9d6c-2371a37962e6

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 16, 2026

@copilot implement suggestion

Fixed in 9145e4f"Tool called: create_pr""Tool called: create-pull-request" to match the wire name convention used by the other tools.

@jamesadevine jamesadevine merged commit 29b221e into main May 16, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants